What else is new than the hamming window? robust MFCCs for speaker recognition via multitapering

نویسندگان

  • Tomi Kinnunen
  • Rahim Saeidi
  • Johan Sandberg
  • Maria Hansson
چکیده

Usually the mel-frequency cepstral coefficients (MFCCs) are derived via Hamming windowed DFT spectrum. In this paper, we advocate to use a so-called multitaper method instead. Multitaper methods form a spectrum estimate using multiple window functions and frequency-domain averaging. Multitapers provide a robust spectrum estimate but have not received much attention in speech processing. Our speaker recognition experiment on NIST 2002 yields equal error rates (EERs) of 9.66 % (clean data) and 16.41 % (-10 dB SNR) for the conventional Hamming method and 8.13 % (clean data) and 14.63 % (-10 dB SNR) using multitapers. Multitapering is a simple and robust alternative to the Hamming window method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multitaper MFCC Features for Acoustic Stress Recognition from Speech

Ameliorating the performances of speech recognition system is a challenging problem interesting recent researchers. In this paper, we compare two extraction methods of Mel Frequency Cepstral Coefficients used to represent stressed speech utterances in order to obtain best performances. The first method known as traditional is based on single window (taper) generally the Hamming window and the s...

متن کامل

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

Speaker Recognition by Combining MFCC and Phase Information in Noisy Conditions

In this paper, we investigate the effectiveness of phase for speaker recognition in noisy conditions and combine the phase information with mel-frequency cepstral coefficients (MFCCs). To date, almost speaker recognition methods are based on MFCCs even in noisy conditions. For MFCCs which dominantly capture vocal tract information, only the magnitude of the Fourier Transform of time-domain spee...

متن کامل

On the use of asymmetric-shaped tapers for speaker verification using i-vectors

This paper presents asymmetric-shaped tapers (or windows) for speaker recognition. Symmetric tapers (e.g., hamming), having the linear phase property and longer time delay, are widely used for short-time analysis of speech signals. Since human speech perception is relatively insensitive to short-time phase distortion, the linearity constraint on phase can be removed without any adverse effects....

متن کامل

Wavelet-Based Mel-Frequency Cepstral Coefficients for Speaker Identification using Hidden Markov Models

To improve the performance of speaker identification systems, an effective and robust method is proposed to extract speech features, capable of operating in noisy environment. Based on the time-frequency multi-resolution property of wavelet transform, the input speech signal is decomposed into various frequency channels. For capturing the characteristic of the signal, the Mel-Frequency Cepstral...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010